Big Scale Text Analytics and Smart Content Navigation

نویسندگان

  • Karsten Schmidt
  • Sebastian Bächle
  • Philipp Scholl
  • Georg Nold
چکیده

Identifying and exploring relevant content in growing document collections is a challenge for researchers, users, and system providers alike. Supporting this is crucial for companies offering knowledge in the form of documents as their core product. Our demo shows an intelligent way of doing guided research in big text collections, using the collection of the major scientific publisher Springer SBM as an example data set. We use the SAP HANA platform for flexible text analysis, ad-hoc calculations and data linkage, in order to enhance the experience of users navigating and exploring publications. We integrate unstructured data (textual documents) and structured data (document metadata and web server logs), and provide interactive filters in order to enable a responsive user experience while searching for relevant content. With HANA, we are able to implement this functionality over big data on a single machine by making use of HANA’s SQL data store and the built-in application server.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of Big Data Analytics in Power Distribution Network

Smart grid enhances optimization in generation, distribution and consumption of the electricity by integrating information and communication technologies into the grid. Today, utilities are moving towards smart grid applications, most common one being deployment of smart meters in advanced metering infrastructure, and the first technical challenge they face is the huge volume of data generated ...

متن کامل

Guest Editorial: Big Data Analytics and the Web

THE paper by Shao et al., “Clustering Big SpatiotemporalInterval Data,” focuses on clustering big spatiotemporal data, which are common in the emerging Web of Things (WoT), where a large number of sensors are deployed for continuously collecting data. The authors explore a novel way to cluster massive Web data with spatiotemporal intervals in multiple Euclidean spaces, as well as a new energy f...

متن کامل

Big Data Quality: From Content to Context

Over the last 20 years, and particularly with the advent of Big Data and analytics, the research area around Data and Information Quality (DIQ) is still a fast growing research area. There are many views and streams in DIQ research, generally aiming at improving the effectiveness of decision making in organizations. Although there are a lot of researches aimed at clarifying the role of BIG data...

متن کامل

A Fuzzy TOPSIS Approach for Big Data Analytics Platform Selection

Big data sizes are constantly increasing. Big data analytics is where advanced analytic techniques are applied on big data sets. Analytics based on large data samples reveals and leverages business change. The popularity of big data analytics platforms, which are often available as open-source, has not remained unnoticed by big companies. Google uses MapReduce for PageRank and inverted indexes....

متن کامل

Integrating Modeling Languages and Web Logs for Enhanced User Behavior Analytics

While basic Web analytics tools are widespread and provide statistics about Web site navigation, no approaches exist for merging such statistics with information about the Web application structure, content and semantics. We demonstrate the advantages of combining Web application models with runtime navigation logs, at the purpose of deepening the understanding of users behaviour. We propose a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014